JEIDA's Bilingual Corpus and Other Corpora for NLP Research in Japan

نویسنده

  • Hitoshi ISAHARA
چکیده

The committee on text processing technology of JEIDA (Japan Electronics Industry Development Association) has been developing its bilingual corpus for research on machine translation systems since the 1996 Japanese fiscal year. An overview of this bilingual corpus is presented in this paper. And other linguistic data recently developed in Japan, which includes the RWC text database and the simple sentence data by the CRL and IPA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

JEIDA's English-Japanese Bilingual Corpus Project

JEIDA (Japan Electronics Industry Development Association) has been developing a large bilingual aligned corpus for research in NLP, since the 1996 Japanese fiscal year. In fiscal year 1996, JEIDA did a feasibility study and received permission from the Japanese Ministries to create such a resource. JEIDA, then, made a "small" sentence aligned corpus in fiscal year 1997. JEIDA's new project sta...

متن کامل

Multilingual Document Alignment - A Study with Chinese and Japanese

Natural language processing (NLP) community is increasingly using paralleland comparablecorpora for cross-linguistic research. The knowledge extracted from such corpora helps us in cross-language information retrieval, topic detection and tracking, machine translation, and many other NLP tasks. Parallel or comparable corpora of JapaneseChinese language-pair are rare. We investigate an automatic...

متن کامل

Bilingual Parallel Active Learning Between Chinese and English

Active learning is an effective machine learning paradigm which can significantly reduce the amount of labor for manually annotating NLP corpora while achieving competitive perfor-mance. Previous studies on active learning are focused on corpora in one single language or two languages translated from each other. This paper proposes a Bilingual Parallel Active Learning paradigm (BPAL), where an ...

متن کامل

Collocation Translation Acquisition Using Monolingual Corpora

Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency ...

متن کامل

Bilingual Active Learning for Relation Classification via Pseudo Parallel Corpora

Active learning (AL) has been proven effective to reduce human annotation efforts in NLP. However, previous studies on AL are limited to applications in a single language. This paper proposes a bilingual active learning paradigm for relation classification, where the unlabeled instances are first jointly chosen in terms of their prediction uncertainty scores in two languages and then manually l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006